Retrieving Knowledge from Technical, Manually Indexed Corpora

نویسندگان

  • Isabelle Moulinier
  • Catherine Gouttas
چکیده

In technical elds, experts have manually indexed a huge collection of texts. For this purpose, they used thesauri, which structure sets of available keywords. Approaches in automatic indexing have made extensive use of thesauri. However, our belief is that automated systems do not wholly take into account the experts' knowledge. We thus present a method to extract that kind of knowledge from manually indexed technical corpora. We combine a linguistic analysis with a learning algorithm. This results in a set of indexing rules covering a domain speciied by a thesaurus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents

The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...

متن کامل

Generating Concise Rules for Retrieving Human Motions from Large Datasets

This paper proposes a method for retrieving human motion data with concise retrieval rules based on the spatio-temporal features of motion appearance. Our method first converts motion clip into a form of clausal language that represents geometrical relations between body parts and their temporal relationship. A retrieval rule is then learned from the set of manually classified examples using in...

متن کامل

Domain Specific Sense Disambiguation with Unsupervised Methods

Most approaches in sense disambiguation have been restricted to supervised training over manually annotated, non-technical, English corpora. Application to a new language or technical domain requires extensive manual annotation of appropriate training corpora. As this is both expensive and inefficient, unsupervised methods are to be preferred, specifically in technical domains such as medicine....

متن کامل

Text-Image Interaction for Image Retrieval and Semi-Automatic Indexing

This paper addresses the issue of retrieving images based on visual content, according a particular attention to the semantic dimension of information retrieval. A brief review of existing Image Retrieval Systems is provided, hilighting a major drawback of these prototypes, namely the lack of integration between classical \semantic search", and visual similarity retrieval (i.e. content-based re...

متن کامل

Retrieving Domain-Specific Collocations by Co-occurrences and Word Order Constraints

In this paper, we describe a method for automatically retrieving collocations from large text corpora. This method comprises the following stages: (1) extracting strings of characters as units of collocations, and (2) extracting recurrent combinations of strings as collocations. Through this method, various types of domain-specific collocations can be retrieved simultaneously. This method is pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007